Clustering Big Urban Dataset
نویسندگان
چکیده
Cities are producing and collecting massive amount of data from various sources such as transportation network, energy sector, smart homes, tax records, surveys, LIDAR data, mobile phones sensors etc. All of the aforementioned data, when connected via the Internet, fall under the Internet of Things (IoT) category. To use such a large volume of data for potential scientific computing benefits, it is important to store and analyze such amount of urban data using efficient computing resources and algorithms. However, this can be problematic due to many challenges. This article explores some of these challenges and test the performance of two partitional algorithms for clustering Big Urban Datasets, namely: the K-Means vs. the Fuzzy cMean (FCM). Clustering Big Urban Data in compact format represents the information of the whole data and this can benefit researchers to deal with this reorganized data much efficiently. Our experiments conclude that FCM outperformed the K-Means when presented with such type of dataset, however the later is lighter on the hardware utilisations.
منابع مشابه
Evaluation of Updating Methods in Building Blocks Dataset
With the increasing use of spatial data in daily life, the production of this data from diverse information sources with different precision and scales has grown widely. Generating new data requires a great deal of time and money. Therefore, one solution is to reduce costs is to update the old data at different scales using new data (produced on a similar scale). One approach to updating data i...
متن کاملProjective Low-rank Subspace Clustering via Learning Deep Encoder
Low-rank subspace clustering (LRSC) has been considered as the state-of-the-art method on small datasets. LRSC constructs a desired similarity graph by low-rank representation (LRR), and employs a spectral clustering to segment the data samples. However, effectively applying LRSC into clustering big data becomes a challenge because both LRR and spectral clustering suffer from high computational...
متن کاملTailoring Fuzzy C-Means Clustering Algorithm for Big Data Using Random Sampling and Particle Swarm Optimization
As one of the most common data mining techniques, clustering has been widely applied in many fields, among which fuzzy clustering can reflect the real world in a more objective perspective. As one of the most popular fuzzy clustering algorithms, Fuzzy C-Means (FCM) clustering combines the fuzzy theory and K-Means clustering algorithm. However, there are some issues with FCM clustering. For exam...
متن کاملA Probabilistic Embedding Clustering Method for Urban Structure Detection
Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-theart cl...
متن کاملAn Ensemble Clustering for Mining High-dimensional Biological Big Data
Clustering of high-dimensional biological big data is incredibly difficult and challenging task, as the data space is often too big and too messy. The conventional clustering methods can be inefficient and ineffective on high-dimensional biological big data, because traditional distance measures may be dominated by the noise in many dimensions. An additional challenge in biological big data is ...
متن کامل